-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Add native plan checker and native endpoint for Velox plan conversion #23596
base: master
Are you sure you want to change the base?
Conversation
Making this PR a draft as recent discussions will likely require a change to send the |
5b6bb75
to
9396c95
Compare
d5941a7
to
9375271
Compare
I have separated out the first part of this, adding the session property to enable building and validating the plan eagerly at #23649 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Changes to SPI to support custom PlanChecker](https://github.com/prestodb/presto/pull/23596/commits/27b0c2ef2b4758de723712d8ad0cc142d9757b02)
✅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to Java Presto main and server to support custom PlanChecker SPI
❓
@@ -371,6 +377,8 @@ else if (serverConfig.isCoordinator()) { | |||
driftClientBinder(binder).bindDriftClient(ThriftServerInfoClient.class, ForNodeManager.class) | |||
.withAddressSelector(((addressSelectorBinder, annotation, prefix) -> | |||
addressSelectorBinder.bind(AddressSelector.class).annotatedWith(annotation).to(FixedAddressSelector.class))); | |||
// NodeManager instance for plugins to use | |||
binder.bind(NodeManager.class).to(ConnectorAwareNodeManager.class).in(Scopes.SINGLETON); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than inject this, I would just construct it where needed.
presto-main/src/main/java/com/facebook/presto/sql/planner/sanity/PlanChecker.java
Outdated
Show resolved
Hide resolved
...cebook/presto/sql/planner/sanity/plancheckerprovidermanagers/PlanCheckerProviderManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add plan checker provider and plugin for NativePlanChecker
❓
.../src/main/java/com/facebook/presto/plancheckerproviders/nativechecker/NativePlanChecker.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/plancheckerproviders/nativechecker/NativePlanChecker.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/plancheckerproviders/nativechecker/NativePlanChecker.java
Outdated
Show resolved
Hide resolved
...n/java/com/facebook/presto/plancheckerproviders/nativechecker/NativePlanCheckerProvider.java
Outdated
Show resolved
Hide resolved
|
||
// Create static taskId and empty TableWriteInfo needed for plan conversion | ||
protocol::TaskId taskId = "velox-plan-conversion.0.0.0"; | ||
auto tableWriteInfo = std::make_shared<protocol::TableWriteInfo>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this causes a problem for create table, need to address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be a problem anymore, correct me if I'm missing anything here
I just encountered this again, so looks like there is still an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed with @tdcmeehan and since TableWriteInfo
is not built until after the planning stage, we need to have an alternative to handle fragments with a TableWriterNode
. Some alternatives discussed:
- Adding a field to the TableWriteInfo to indicate that the converter should proceed and skip the required fields, like
ExecutionWriterTarget
- Make a dummy TableWriteInfo with the fields populated enough to pass the conversion
- Skip the TableWriteNode portion of the fragment, the source node can continue with the conversion
All of these options would result in an incomplete conversion of a fragment with a TableWriteNode, but since that is not available at the time the plan validation happens, there isn't much we can do. Here I implemented option (3) which is the least intrusive to the plan conversion.
presto-native-execution/presto_cpp/main/types/PrestoToVeloxQueryPlan.cpp
Outdated
Show resolved
Hide resolved
if (error.empty()) { | ||
http::sendOkResponse(downstream, json(R"({ "status": "ok" })")); | ||
} else { | ||
http::sendErrorResponse(downstream, json(R"({ "status": "error", "message": ")" + error + R"(")})")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this look like on the user side. e.g. can we report different kinds of error messages depending on what the failure was? For regular tasks, i think we send and OkResponse for most kinds of failures, and put the error information in the TaskInfo. Maybe we should do something similar here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make the response however you think is best. I didn't put too much thought into the message since we discussed that this would eventually be sending the converted plan back. Are you saying if the plan validation fails you would send an OkResponse and not an ErrorResponse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think we should follow the model for how errors get passed along in general for native workers, which i think is an OkResponse with the error information in the TaskInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rschlussel if I understand correctly, you are asking for the response to be a TaskInfo
message with ExecutionFailureInfo
to be populated if the conversion fails?
On the Java side, TaskInfo
and many of the related classes are not part of the SPI, so things will need to be moved around. Is that what we want here?
On the C++ side, there is lots of information in the TaskInfo
struct, I'm not sure how much is really useful in this case and if it needs to be populated with the TaskManager
and have a corresponding PrestoTask
.
Would it make sense to define a simple response message? cc @tdcmeehan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the difference with tasks is that 4xx is returned when we can not create the task. What is being reported by the TaskInfo is whether or not the already successfully created task ended up failing. That should be returned as a 200, because the call to check on the task status itself succeeded, and it is merely reporting the underlying failure. When we go to update or create the task, when 4xx is returned, we are returning that only when the task cannot be created.
In this case, what is being returned is the converted plan. That should probably return code 422, because we are returning whether or not the conversion failed. Like all 4xx error codes, this indicates the problem is with the input. It's similar to a task update or creation failing.
String responseBody = response.body() != null ? response.body().string() : "{}"; | ||
LOG.error("Native plan checker failed with code: %d, response: %s", response.code(), responseBody); | ||
if (config.isQueryFailOnError()) { | ||
throw new PrestoException(QUERY_REJECTED, "Query failed by native plan checker with code: " + response.code() + ", response: " + responseBody); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QUERY_REJECTED seems like the wrong error code. We should propagate the error from the side car that caused the failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean the response from the side car should have an error code we use here instead of QUERY_REJECTED
? The error message is propagated and shown to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error that the user sees shouldn't be QUERY_REJECTED. It should be whatever error caused the plan checking to fail.
.../src/main/java/com/facebook/presto/plancheckerproviders/nativechecker/NativePlanChecker.java
Outdated
Show resolved
Hide resolved
try (Response response = httpClient.newCall(request).execute()) { | ||
if (!response.isSuccessful()) { | ||
String responseBody = response.body() != null ? response.body().string() : "{}"; | ||
LOG.error("Native plan checker failed with code: %d, response: %s", response.code(), responseBody); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think we need to log the error if we're failing the query with that error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is fine, I just want to make sure it's clear that if failed due to native plan checker, so if the exception message contains that and is logged, then that should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it doesn't, it should be changed so that it does. It's much more helpful to have the failure reason show up in the query error than to have to scour the coordinator logs for any relevant error messages.
presto-main/src/main/java/com/facebook/presto/sql/planner/sanity/PlanChecker.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/sanity/PlanChecker.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/sanity/PlanChecker.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/sanity/PlanChecker.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/com/facebook/presto/spi/plan/PlanCheckerProvider.java
Outdated
Show resolved
Hide resolved
9375271
to
873cab6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, new local doc build, looks good. Thanks!
Adding new SPI to support plugin custom plan checkers to be provided for validating intermediate, final, and fragment stages of a logical plan. The motivation for this is to allow for a native plan checker that will eagerly validate a plan on the native sidecar to quickly fail the query if there are incompatibilities. See also: prestodb#23596 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
This adds a new SPI to support plugin custom plan checkers to be provided for validating intermediate, final, and fragment stages of a logical plan. The motivation for this is to allow for a native plan checker that will eagerly validate a plan on the native sidecar to quickly fail the query if there are incompatibilities. Add unit test to verify that a queued query can be validated and fail while queued. This is done by using the new custom plan checker SPI to add plugin that will trigger a failure when validating the plan. See also: prestodb#23596 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
This adds a new SPI to support plugin custom plan checkers to be provided for validating intermediate, final, and fragment stages of a logical plan. The motivation for this is to allow for a native plan checker that will eagerly validate a plan on the native sidecar to quickly fail the query if there are incompatibilities. Add unit test to verify that a queued query can be validated and fail while queued. This is done by using the new custom plan checker SPI to add plugin that will trigger a failure when validating the plan. See also: #23596 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
Apologies for the delay in response @tdcmeehan and @rschlussel , I was focused on getting the other pr ready. I think I addressed most of this feedback, just a couple remaining things, and I'll do some testing then remove from draft soon. |
873cab6
to
7d39e04
Compare
This adds a new SPI to support plugin custom plan checkers to be provided for validating intermediate, final, and fragment stages of a logical plan. The motivation for this is to allow for a native plan checker that will eagerly validate a plan on the native sidecar to quickly fail the query if there are incompatibilities. Add unit test to verify that a queued query can be validated and fail while queued. This is done by using the new custom plan checker SPI to add plugin that will trigger a failure when validating the plan. See also: prestodb#23596 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
This adds a new SPI to support plugin custom plan checkers to be provided for validating intermediate, final, and fragment stages of a logical plan. The motivation for this is to allow for a native plan checker that will eagerly validate a plan on the native sidecar to quickly fail the query if there are incompatibilities. Add unit test to verify that a queued query can be validated and fail while queued. This is done by using the new custom plan checker SPI to add plugin that will trigger a failure when validating the plan. See also: prestodb#23596 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
8cf9227
to
111d6b4
Compare
111d6b4
to
09d9219
Compare
09d9219
to
f87517b
Compare
f87517b
to
33a6427
Compare
planFragment.root)) { | ||
// TableWriteInfo is not yet built at the planning stage, so we can not | ||
// fully convert a TableWriteNode and skip that node of the fragment. | ||
converter.toVeloxQueryPlan(writeNode->source, tableWriteInfo, taskId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler : You have to call the VeloxPlanValidator as well https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/main/types/VeloxPlanValidator.cpp#L35
We can eventually remove it after this plan checker is in production.
@@ -1853,7 +1855,9 @@ core::PlanFragment VeloxQueryPlanConverterBase::toVeloxQueryPlan( | |||
return planFragment; | |||
} | |||
case protocol::SystemPartitionFunction::HASH: { | |||
auto numPartitions = partitioningScheme.bucketToPartition->size(); | |||
auto numPartitions = partitioningScheme.bucketToPartition != nullptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : You can use shorthand for != nullptr check
auto numPartitions = partitioningScheme.bucketToPartition ? partitioningScheme.bucketToPartition->size() : 1;
planFragment.root)) { | ||
// TableWriteInfo is not yet built at the planning stage, so we can not | ||
// fully convert a TableWriteNode and skip that node of the fragment. | ||
converter.toVeloxQueryPlan(writeNode->source, tableWriteInfo, taskId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler @tdcmeehan : A design point to keep in mind is the extent to which this code might catch errors.
For the most part the Presto -> Velox plan conversion would catch many cases.
But there is also a category of errors that happen when the operators are constructed from the Velox PlanNode(s). e.g. Scalar, aggregate, window function objects get created only on the Project/HashAgg//Window operator initialization. So certain function validations happen only then.
Take a look at https://github.com/facebookincubator/velox/blob/main/velox/exec/LocalPlanner.cpp#L416.
As we gain more experience running this code, its likely we will want to abstract some part of the Driver creation as well for the plan validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We knew that this wouldn't catch all errors, but hopefully catch most early enough during queueing to be useful. It will be great to categorize what native errors are not caught by this and see what else can be done about those.
Thanks for reviewing @aditi-pandit , I can look into the additional Velox plan validation. |
33a6427
to
06b1c2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add native plan checker to sidecar plugin
✅
return planConversionResponse.getFailures().get(0); | ||
} | ||
else { | ||
throw new PrestoException(GENERIC_INTERNAL_ERROR, "Error response without failure from native plan checker with code: " + response.code()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this a dedicated internal error. We can make a new instance of ErrorCodeSupplier for this plugin and add this there.
return injector.getInstance(PlanCheckerProvider.class); | ||
} | ||
catch (IOException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UncheckedIOException
super basic question, but i'm not seeing where the code is that parses the config |
@rschlussel do you mean this code? 70db96c#diff-7fa44d7637e8a97126d21846641c8aad854e8a82f062b8b839bf6a14f5bc03deR87-R105 |
|
||
@JsonCreator | ||
public PlanConversionResponse( | ||
@JsonProperty("status") String status, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand, I'm wondering what this status gives us that the HTTP response code doesn't.
On the other hand, it seems like we'd want the actual converted response back from the endpoint, because eventually we'd need this for the explain plan feature. But I guess we ca do this in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this, what does the code return that the HTTP status code doesn't give us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, didn't see the previous comment. Yes, it redundant. I only added so that there could be something in the response message to indicate if it was successful or not, but you could also infer that based on if there is a failure in the message or the HTTP status code.
I did figure this response message would change anyway once there is a need to get the converted plan back, but I'll go ahead and remove this status for now.
yes, thank you! I knew I'd seen it and then couldn't find it again. I think this isn't so standard in Presto to have each plugin manage it's own configuration (and also this will get loaded always instead of only if there is a config file for it, which also means that if the default for that config changes from false to true, then it will get enabled everywhere, even if you didn't configure the plugin). e.g. FunctionNamespaceManager we have StaticFunctionNamespaceStore and for connectors we have StaticCatalogStore that handles setting the directories, and listing the configuration files and then registering the catalogs/function namespace if they are loaded. We should follow something similar for plan checkers. (that change doesn't need to block this, though) |
I'll make this change |
06b1c2c
to
c02b34a
Compare
@aditi-pandit and @rschlussel , could you please review the changes on the native side again? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java code LGTM % couple questions/nits
...idecar-plugin/src/main/java/com/facebook/presto/sidecar/nativechecker/NativePlanChecker.java
Outdated
Show resolved
Hide resolved
|
||
@JsonCreator | ||
public PlanConversionResponse( | ||
@JsonProperty("status") String status, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this, what does the code return that the HTTP status code doesn't give us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to add tests as well. What is your testing plan for this code ?
Also, it would be great to have some documentation as well. What is the documentation plan ?
Thanks @BryanCutler. There is one major comment about the protocol. The rest might be less severe.
@@ -527,6 +528,15 @@ void PrestoServer::run() { | |||
prestoServerOperations_->runOperation(message, downstream); | |||
}); | |||
|
|||
httpServer_->registerPost( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this function in registerSidecarEndPoints https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/main/PrestoServer.cpp#L1427
try { | ||
auto headers = message->getHeaders(); | ||
|
||
std::ostringstream oss; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code to extract an std::string from const std::vector<std::unique_ptrfolly::IOBuf>& body can be a common inline function in presto_cpp/main/common/Utils.h. I've seen this code repeated in multiple places.
@@ -2546,3 +2546,11 @@ enum class NodeState { ACTIVE, INACTIVE, SHUTTING_DOWN }; | |||
extern void to_json(json& j, const NodeState& e); | |||
extern void from_json(const json& j, NodeState& e); | |||
} // namespace facebook::presto::protocol | |||
namespace facebook::presto::protocol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is code-generated. Please see the instructions at https://github.com/prestodb/presto/tree/master/presto-native-execution/presto_cpp/presto_protocol. You might have to create a special sub-directory and add this code there. Else another option is to making it a shared protocol class and adding to the yml file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit since the response is defined in a plugin, where would you recommend putting the definition? I see now the protocol has been broken up into core and connectors. Does it make sense to add a new yml file, like "presto_sidecar.yml" or something under connectors?
@Override | ||
public void validateFragment(SimplePlanFragment planFragment, WarningCollector warningCollector) | ||
{ | ||
if (!planFragment.getPartitioning().isCoordinatorOnly() && !isInternalSystemConnector(planFragment.getRoot())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some system queries do run on the worker see #21416. You should still validate those plans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit do you know where the filtering happens where we decide whether to run a system query on the coordinator or the worker?
This adds a provider for a native plan checker that will send a plan fragment to the native sidecar where it is validated by performing a conversion to a Velox plan. If the conversion succeeds the query will continue, if it fails then the query will fail with an error from the native sidecar. The provider is added to the native sidecar plugin and is enabled with the config `native-plan-checker.plan-validation-enabled=true` from filename `etc/plan-checker-providers/native-plan-checker.properties`. See also: prestodb#23649 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
This adds an endpoint to the native Presto server that will convert a Presto plan fragment to Velox. If the conversion is successful, the server will send an ok response. If it fails, the server will send an error response with a 422 status code as unprocessable. The error message will contain an ExecutionFailureInfo struct with error type, code and message. See also prestodb#23649 RFC: https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
a9ddecb
to
e91a08b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review for the first commit.
@@ -61,9 +62,11 @@ public void addPlanCheckerProviderFactory(PlanCheckerProviderFactory planChecker | |||
} | |||
} | |||
|
|||
public void loadPlanCheckerProviders() | |||
public void loadPlanCheckerProviders(NodeManager nodeManager) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can NodeManager be injected instead along with SimplePlanFragmentSerde?
@Override | ||
public void validateFragment(SimplePlanFragment planFragment, WarningCollector warningCollector) | ||
{ | ||
if (!planFragment.getPartitioning().isCoordinatorOnly() && !isInternalSystemConnector(planFragment.getRoot())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditi-pandit do you know where the filtering happens where we decide whether to run a system query on the coordinator or the worker?
public static class PlanConversionFailure | ||
implements ErrorCodeSupplier | ||
{ | ||
private final String type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this "type", and how is it different from Errorcode.type?
private final String type; | ||
private final String message; | ||
private final ErrorCode errorCode; | ||
private final PlanConversionFailure cause; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the cause here?
Injector injector = app | ||
.noStrictConfig() | ||
.doNotInitializeLogging() | ||
.setRequiredConfigurationProperties(getConfig()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of getting the config yourself, the properties will now be passed to the create() call in the properties field. You can just pass that in directly.
} | ||
} | ||
|
||
public static class PlanConversionFailure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd love some examples of what this will look like in various failure scenarios to understand this better.
Description
This adds a provider for a native plan checker that will send a plan fragment to the native sidecar where it is validated by performing a conversion to a Velox plan. It is added to the native sidecar plugin and is enabled with the config
native-plan-checker.plan-validation-enabled=true
from filenameetc/plan-checker-providers/native-plan-checker.properties
.A new endpoint is added to the native worker to accept the plan node fragment and return an ok response if the conversion is successful and an error response with an HTTP code 422 if failed, along with the failure information.
This follows the addition from #23649 so that when
eager-plan-validation-enabled=true
then a query can be validated while queued for correctness on a native worker.RFC for this addition has been discussed and merged at https://github.com/prestodb/rfcs/blob/main/RFC-0008-plan-checker.md
Motivation and Context
This is a useful addition to help fail queries that are incompatible with Velox before execution begins and cluster resources have been allocated.
Impact
Native plan checker enabled with config
native-plan-checker.plan-validation-enabled=true
from filenameetc/plan-checker-providers/native-plan-checker.properties
. The native plan checker is disabled by default.Test Plan
Added unit tests for Java and C++, as well as manual testing of queries that currently fail on a native worker.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.